Development and Validation of a Culture-Sensitive Generic Health Literacy Scale in Turkish-Speaking Adults

Background: Improving health literacy has become one of the most important public health-related goals at the global level; however, there is no clear consensus on measurement of health literacy. Despite numerous health literacy scales available in Turkish, none of the existing scales was originally developed and validated at a national level. Objective: This study aimed to develop and validate a culturally appropriate original health literacy scale (HLS) to be used as a reference for the Turkish-speaking literate adult population in Turkey and abroad. Methods: Two multidisciplinary workshops with more than 20 experts were conducted and a large item pool was developed. The first and second draft of the scale were pre-tested with 20 and 150 adults, respectively, from different age groups and socioeconomic levels in Ankara. The validity and reliability study of the revised scale (110 items plus 20 self-efficacy statements) was carried out with a household survey of 2,411 adults in 12 randomly selected provinces from 12 Nomenclature of Territorial Units for Statistics Regions in Turkey. Explanatory and confirmatory factor analysis were performed. The fit indices were obtained. The item analysis was applied, and Cronbach's alpha statistics were obtained. Key Results: The scale was found to be both a valid and a reliable measurement tool to assess health literacy. Cronbach's alpha for two sub-dimensions (“disease prevention and health promotion” and “treatment and access to health services”) were 0.79 and 0.91, respectively. Construction validity indices were Root Mean Square Error of Approximation (RMSEA) = 0.043, Goodness of Fit Index (GFI) = 0.96, Normed Fit Index (NFI) = 0.95, and Adjusted Goodness of Fit Index (AGFI) = 0.95. The scale includes “self-efficacy” as an additional dimension (Cronbach's alpha = 0.83, RMSEA = 0.68, GFI = 0.94, NFI = 0.94, and AGFI) = 0.91). Conclusion: HLS is a valid and reliable measurement tool to assess health literacy of Turkish-speaking literate adults with a mixed (objective and subjective) assessment approach. [HLRP: Health Literacy Research and Practice. 2022;6(1):e2–e11.] Plain Language Summary: This study aimed to develop and validate a culturally sensitive original health literacy scale to be used as a reference scale for the Turkish-speaking literate adult population in Turkey and abroad. Study findings showed that HLS is both a valid and a reliable measurement tool to assess health literacy of Turkish-speaking literate adults.

As a complex concept, the definition of health literacy is highly context dependent. This complexity poses the challenge of making a standard definition of health literacy and developing tools that measure all its dimensions (Poureslami et al., 2017;Sørensen et al., 2012). In addition, current health literacy measures have been criticized for solely measuring reading and numeracy skills when a broader set of skills is necessary for making informed health decisions (Curtis et al., 2015).
In a systematic review of health literacy assessment tools by Altin et al. (2014), about one-third of the originally de-veloped instruments used either a direct test of a person's abilities (objective measurement) or the elicitation of selfreported abilities (subjective measurement). The study found that almost all instruments applied a multidimensional measurement, and a majority used a mixed approach; however, according to the authors, "there is no clear indication of the demanded consensus on health literacy measurement" (Altin et al., 2014, p.10). Pleasant et al. (2011) concluded that there have been only minor developments among the measurement formats, even though the academic world was calling for new instruments.
When the scales in Turkish were examined, all of them were found to be adapted from scales developed in English. For example, the most widely used scales-Rapid Estimate of Adult Literacy in Medicine (REALM) (Davis et al., 1991), Newest Vital Sign (NVS) (Weiss et al., 2005), and Short Test of Functional Health Literacy (S-TOFHLA) (Baker et al., 1999)-were adapted to Turkish by different research groups (Eyüboğlu & Schulz, 2016;Ozdemir et al., 2010). However, Ozdemir et al. (2010) found that using two different scales revealed different health literacy levels for the same Turkish patient group studied. Other studies showed that cultural and linguistic differences also affect measurements, which sometimes give contradictory results (Weiss et al., 2005). In a study comparing migrant and host populations, Turkish migrants were found to have higher NVS scores than Dutch participants, whereas they scored lower in REALM when compared to the same host population (Fransen et al., 2011).
Two of the largest health literacy assessment studies in Turkey used measurement tools that were either adapted from or based on the framework of the European Health Literacy Survey Questionnaire (Abacıgil et al., 2019;Agrali and Akyar, 2018;Durusu Tanrıover et al., 2014;HLS-EU Consortium, 2012). Besides adaptation studies, there were also some initiatives to develop original measurement tools in Turkish (Sezer & Kadıoğlu, 2014). However, these studies were conducted with patient groups in primary care settings. Hence, despite numerous health literacy scales available in Turkish, none of the existing scales was originally developed and validated at a national level.
With this significant gap in mind, this study aimed to develop and validate a culturally sensitive original health literacy scale (HLS) to be used as a reference scale for the Turkish-speaking literate adult population in Turkey and abroad.

Conceptualization of Health Literacy
Step 1: Initial development of items. At the beginning, widely used international health literacy scales like REALM (Davis et al., 1991), TOFHLA (Parker et al., 1995), S-TOFHLA (Baker et al., 1999), NVS (Weiss et al., 2005), and adapted/locally developed scales in Turkey (Avcı, 2013;Durusu Tanrıover et al., 2014;Sezer & Kadıoğlu, 2014) were reviewed in terms of their scale-developing process and item structures. Then, to measure health literacy in different levels of cognitive domain and systemize the item development, a matrix was created. The horizontal axis of the matrix consisted of four categories (subdimensions) inspired from comprehensive public health approach related to health literacy ("disease prevention," "health promotion, " "treatment," "access to health services"). Vertical axis consisted of three categories ("knowledge," "comprehension, " and "application") of the cognitive domain of Bloom's taxonomy) (Herr, 2007). In this axis, knowledge was defined as "the remembering of previously learned material"; comprehension was defined as "the ability to grasp the meaning of material" and application referred to "the ability to use learned material in new and concrete situations" (Herr, 2007, p. 1) (Table 1). In addition, a subjective measurement part involving "self-efficacy" items were added for attitude and awareness ("affective domain" included feelings, values, appreciation, enthusiasms, motivations, and attitudes in Bloom's taxonomy) (Anderson, 2001). The headings and some samples of items for each category of the matrix were given in Table 1.
Thereafter, a multidisciplinary workshop with 20 experts (from fields such as public health, family medicine, internal medicine, obstetrics and gynecology, nursing, educational measurement and evaluation, biostatistics, sociology, social work, journalism, communication, Turkish language and literature, Turkish folklore) was conducted. During this workshop, the experts developed a 404-item pool by using the abovementioned matrix (composed of multiple choice, true/ false, and matching questions). In addition, 20 self-efficacy statements were added to the scale.
Step 2: Reducing the items (first pre-test): The first draft of the measurement tool was pre-tested with 20 adults chosen haphazardly from different gender and age groups from a district of Ankara city. Then, in a second workshop with Your neighbor has told you that she has a mole on her shoulder, which got bigger in recent months and started to bleed. Which medical specialty is mostly concerned about this type of health problem?
Self-efficacy Please evaluate each statement and choose the most appropriate option (never, sometimes, or always): "I can understand what's written on health-related documents given to me in health care centers" "I can fill the patient consent forms without assistance" the same experts, the results of the pre-test were discussed. During this workshop, the items, which were not properly understood by the participants of the pre-test, were deleted and some were revised. At the end, the second draft of the scale was developed with 243 items plus 20 self-efficacy statements. There was no need to make any revision in selfefficacy statements.
Step 3: Second pre-test: A second pre-test was conducted with 150 adults from different age groups (between ages 18 and 60 years) and socioeconomic levels. The participants resided at one district of Ankara, and their age and gender distribution were similar to the final sample of the validation study. The data collected were analyzed by performing itemtotal correlation analysis. In the end, 110 items with ≥0.5 loading were used for the purpose of scale construction, and then the third draft scale was generated with 110 items plus 20 self-efficacy statements. Step

4: Validation study (community-based):
The priority in scale development process is to examine the group that best predicts the properties of the items in the scale; it does not matter whether the sample represents the population or not. The information obtained from the scale to be developed is used to interpret the items in the scale, not to generalize to the population from which the sample is taken. From this point, authors suggest that the sample size should be a certain multiple of the number of items in the newly developed scale (Erkuş, 2012;Stevens, 1996), whereas some others stated that between 500 and 1,000 people are good or very good for the sample size (Comrey & Lee, 1996). In our study, based on the information above, the sample size was determined.
Turkey was hypothetically divided randomly into 12 Nomenclature of Territorial Units for Statistics (NUTS) according to different economic and cultural/social characteristics by the Turkish Statistical Institute. From each province, 200 adults were selected (with an equal number of men and women in age groups 18-29 years, 30-39 years, and 40-60 years), for a total of 2,400 participants by using convenience sampling to perform the validation study of the scale. The study was carried out with a household survey of 2,466 adults (in some provinces, 66 additional participants were recruited due to convenience). Data analysis was performed with 2,411 adults after removing 55 health worker participants.
After performing the validity and reliability analysis, 71 items plus 16 self-efficacy statements remained for the final version of the scale. See Figure 1 for the study flow chart.

Statistical Analysis
Data analyses involved calculation of the discrimination index, mean and standard deviations for the items, as well as validity-reliability measures for the scale. Stratified Cronbach's alpha and McDonald's omega coefficients were used for the overall reliability, whereas Cronbach's alpha and splithalf reliability statistics were used to test the reliability of the sub-test of the scale. Confirmatory factor analysis based on polychoric correlations was performed to show construct validity of two-dimensional scale. In addition, one-dimensional model and second order confirmatory factor analysis were applied, and the results were compared with two-dimensional model. Every item of the scale was scored as "1": true answer and "0": wrong answer. Self-efficacy statements were scored as "1": never; "2": sometimes; and "3": always. SPSS version 23.0, R version 4.0.0, and FACTOR 10.8.02 programs were used for data analyses.
Ethical approval for the study was obtained from Hacettepe University Non-Interventional Clinical Researches Ethics Board, and official permission from the Ministry of Interior.

The Analysis for Health Literacy Scale
Item analysis: Public health and educational measurement and evaluation experts discussed the appropriateness and distribution of the items to Bloom's taxonomy after the second pre-test process again and made the necessary item revisions. After the implementation of the household survey, the compatibility of the expert opinions with the application results was checked by using the correlations between the items.
In this analysis, frequencies as well as difficulty and discrimination of items were evaluated. It was found that the items generally showed positive correlation. The difficulty levels of the items were ranged from 0.11 to 0.97 for knowledge, comprehension, and application dimensions, and 0.07 to 0.97 for disease prevention, health promotion, treatment, and access to health services subscales. According to the Bloom's taxonomy dimensions, all of the values of the correlation of items with the sum of other items in each dimension were positive. It has been observed that the correlations and expert opinions were compatible.
Items with negative item-total correlation and items with item-total correlation less than 0.20 were excluded from the scale. In addition to the statistical values, the effects of the items on the content validity of the scale as well as the four public health experts' opinions were also taken into consideration when removing the items from the scale. According to the results, four subscales, which can be seen in Table 1, were needed to be recombined into two subscales as "disease prevention-health promotion" and "treatment-access to health services. " After that, item analysis was re-performed for two-dimensional structure and the statistics were found to be sufficient. According to the item analysis results, it was observed that the difficulty values of the items ranged between 0.29 and 0.87 and the discrimination values (point-biserial correlations) varied between 0.21 and 0.45 for the "disease prevention-health promotion" dimension; these values were 0.20 and 0.97 and 0.09 and 0.56, respectively, for the "treatment-access to health services" dimension of the final scale, which consists of 71 items (from 110 items at the beginning of the analysis).
Confirmatory factor analysis: Construct validity study was conducted with confirmatory factor analysis (CFA) without performing exploratory factor analysis (EFA) for the twodimensional health literacy scale because there was a theoretical basis for two-dimensionality.
In addition, the twodimensional model was compared with one-dimensional and second order CFA models. According to the results, onedimensional and two-dimensional models demonstrated a very good fitness. The fit indices were quite similar. The second order CFA model also showed a good fit; however, the two-dimensional model had the best fit indices compared to others. The results of the CFA analysis performed to confirm the two-dimensionality of the scale where all indices were found to meet the criteria very well, except "Chi-square/ degree of freedom." Because the chi-square test value has been found to be very high when the sample size is more than 200 participants, the results of chi-square goodness of fit were ignored (Kline, 2011). On the other hand, the construct validity of the scale was satisfied according to the other fit indices ( Table 2). The correlation between the factor scores of the two dimensions (disease prevention-health promotion and treatment-access to health services) was 0.90 (p < .001). This value also gave a strong idea of the additivity of the two dimensions. These results showed that the total scores and sub-scale scores could be used together.

The Analysis for Self-Efficacy Part
Self-efficacy part of the scale was designed as onedimensional instrument, and EFA, CFA and reliability analyses were performed. The data were randomly divided into two parts to perform EFA and CFA. For the first half of the data EFA, for the second half of the data CFA were applied to confirm the factor structure of the scale (both analyses were based on polychoric correlations). Reliability analysis was performed on the whole dataset.
The Kaiser-Meyer-Olkin (KMO) sampling adequacy test (Kaiser, 1970;1974) showed that sample size was adequate for EFA (0.91). The Bartlett's Test of Sphericity, which tests whether the correlation matrix differs significantly from the identity matrix, was significant (p < .001). According to this result, data were suitable for explanatory factor analysis. EFA analysis was completed in two steps. In the first step, 20 statements were analyzed. In the second step, the analysis was repeated after excluding 4 statements removed from the scale due to the low factor loads. The results of parallel analysis showed that one-dimensional structure was supported. After EFA was repeated, the results showed that KMO value was 0.90, indicating that the sample size was sufficient. Similarly, the Bartlett's Test of Sphericity was statistically significant (p < .001), which showed that the correlation between the variables was sufficient to perform factor analysis Unweighted least squares method, one of the most suitable methods for the non-normally distributed data, was used as an estimation method in CFA. The path graph, which shows the standardized factor loadings of the self-efficacy part, was given in Figure 2.
According to the path coefficients, there were some modifications between some of the statements. In the process of making the modifications, the following were taken into consideration: error variances, being related items in onedimension, low number of modifications, and significant increase in chi-square goodness of fit value. For the evaluation of this model, all the fit indices except "Chi-square/ degree of freedom" met the model fit criteria at a good level  (Table 2). The results of other fit indices obtained for the onedimensional model provide important evidence for the construct validity of the self-efficacy part.

The Reliability Analysis of Self-Efficacy Part
Split-half reliability, Cronbach's Alpha, Guttman Lambda 2 and McDonald's Omega coefficients were calculated to evaluate the reliability of the self-efficacy part. Cronbach's alpha (0.83), McDonald's omega (0.88), Gutman Lambda-2 (0.83), and split-half reliability (0.73) coefficients were higher than 0.70. These results showed that the self-efficacy scale was quite reliable.

Criterion Validity
It was expected that health workers would have a higher health literacy score from others. From this point, the scores of 55 health workers not included in the analysis were used to assess the criterion validity of the newly developed HLS. From the study group of 2,411 non-health worker participants, 55 were randomly selected in the same age and gender groups with the health worker participants. The mean scores of the scale and self-efficacy part of health-worker and nonhealth worker participants were compared; the difference was statistically significant (Table 3); hence, the criterion validity of the scale was confirmed.

DISCUSSION
Improving health literacy has become one of the most important health-related goals at the global level. The World Health Organization (2008) emphasizes that health literacy is one of the fundamental determinants of health and that all countries need to monitor and improve health literacy levels in a continuous manner. Low health literacy is consistently associated with poorer ability to interpret health messages, higher frequency of risky health behaviors, inappropriate use of health services, more hospitalizations, and poorer performance on rational medicine use (Berkman et al., 2011;Howard et al., 2005;Seldon et al., 2000).
Health literacy scales are important tools for assessing the health literacy of populations for planning and evaluating evidence-based health education and promotion interventions. However, there are significant differences in how health literacy is defined, and which dimensions are taken into consideration regarding this concept (Poureslami et al., 2017;Sørensen et al., 2012). Several articles emphasized that health literacy involves a combination of skills including the ability to interpret documents, read and write prose (print literacy), use quantitative information (numeracy or quantitative literacy), as well as being able to communicate effectively (oral literacy) (Altin et al., 2014;Berkman et al., 2010). On the other hand, one of the most widely discussed approaches to literacy classification included three levels: namely basic/functional literacy, communicative/interactive literacy, and critical literacy. This comprehensive approach indicated that the different levels of literacy progressively allowed for greater autonomy and personal empowerment (Nutbeam, 2000;2008). For instance, TOFHLA consists of two parts and assesses the reading comprehension and numeracy (Parker, 1995); REALM is a quick screening tool identifying the patient reading levels (Davis et al., 1991); REALM-R is a word recognition test only (Bass et al, 2003), and NVS test, measures reading and interpretation skills (Weiss et al., 2005). However, revealed from the comprehensive approach, in our HLS, the three levels of cognitive domain of Bloom's Taxonomy ("knowledge, " "comprehension, " and "application") were used to develop the items as well as the self-efficacy statements, which are based on the affective domain.
There were some other adopted health literacy scales existing in Turkey (Abacıgil et al., 2019;Durusu Tanrıover et al., 2014;Ozdemir et al., 2010). However, there was no original scale developed for healthy adults, representing the country and developed in its own language. This is the first Turkish original scale of its kind. Our findings showed that HLS is both a valid and reliable measurement tool to assess health literacy of Turkish speaking, literate adults.
The process of adapting a measurement tool to another language was seen primarily as a linguistic task in the past. Currently, when a measurement tool is used in different cultures and languages, cultural adaptation as well as preserving its linguistic content accepted. However, it is seen that cultural elements are not sufficiently included in adaptation studies. The large use of a scale in western cultures does not necessarily mean that it will fit well into Turkish culture (Çapık et al, 2018). The adaptation term refers transferring the scale from one language to another (International Test Commission [ITC], 2018). Very faithful, very free, or very literary translations of the original text may not be culturally appropriate and make it difficult to achieve equality in two languages (Çapık et al., 2018). Differences in language structure can cause problems in test translation. For example, the fill in the blanks format is inappropriate in the Turkish language, where the object of a sentence must come before the verb and subject. Because the Turkish people should first look at the end of the statement before they fill out the beginning, the use of incomplete sentences in the English versions of health literacy scales would change the answering behavior completely (ITC, 2018). Two different questions from the TOFHLA scale are given as examples of this reality: "I . . . to provide the county information to . . . any statements given in this . . . and hereby give permission to the . . . to get such proof I . . . that for Medicaid I must report any . . . in my circumstances within . . . (10) days of becoming . . . of the change"; "You must have an . . . stomach when you come for . . . " (McWhorter, 2019). In this regard, if the order of use of the subject and verb in Turkish and English is not considered in translation, there might be misunderstandings (ITC, 2018). Weiss et al. (2005) experienced a similar problem in their study of NVS, which the original language is English. According to the authors, "the psychometric properties of the Spanish version of the NVS, were not as good as those of the English version. This fact may stem from the greater het-erogeneity of language and culture among Spanish-speaking patients" (Weiss et al., 2005, p. 521).
Our original HLS was developed in a linguistically and culturally appropriate manner with the contribution of 20 experts from fields of public health, family medicine, internal medicine, obstetrics and gynecology, nursing, educational sciences, biostatistics, sociology, social work, journalism, communication, Turkish language and literature, and Turkish folklore. This combination of experts had ensured both the appropriateness of cultural and Turkish grammar structure as well as the health dimension.
For the development process of TOFHLA, REALM, and NVS, the validation studies performed on patients admitted to outpatient clinics of hospitals or public and private primary care clinics (Davis et al., 1991;Parker et al., 1995;Weiss et al., 2005). However, as it is well known, some patients could have special characteristics and present specific health literacy levels. For this reason, it is important to develop a scale for determining the level of health literacy of a healthy general population. In our study, the data were gathered at the national level via a household study from healthy people, which were covered in 12 randomly selected provinces of 81 from every NUTS in Turkey. Working with a community-based sample by visiting households, rather than working with specific patient or service user groups in health care settings, as well as the application of the scale to a geographically and culturally diverse population are among the other significant strengths of our original scale.
In conclusion, we suggest that different research groups can use this original HLS to assess the health literacy level of Turkish speaking adults living in Turkey and abroad. The scale will enable standard monitoring, evaluation, and comparison of health literacy levels among different adult population groups; the scale will also assess the effect of health a The statistically significant difference between health workers and non-health workers in terms of total scores of scale, p < .0001. b The statistically significant difference between health workers and non-health workers in terms of the total scores of the self-efficacy part, p = .001. education and health promotion policies and programs in the long term. Even though the scale was developed for Turkishspeaking adults, we believe that the scale can also be adapted to similar languages and cultures in the Eastern Europe and Central Asia region.

STUDY LIMITATIONS
The current study had several limitations. Study participants were literate adults age 18 to 60 years. For this reason, it may not be appropriate to use this scale for adolesecents or older adults without validity and reliability studies. Because the scale is self-administered, it may not be suitable for people with very low educational attainment. Even though the participants were recruited from different regions of Turkey, the sampling method (convenience sampling) and related external validity may have created additional limitations.